A concept-based approach for indexing documents in IR

نویسندگان

  • Mustapha Baziz
  • Mohand Boughanem
  • Salam Traboulsi
چکیده

This paper addresses two important problems related to the use of semantics in IR. The first one concerns the representation of document semantics and its proper use in retrieval. The second is the integration of semantic-based retrieval with "traditional" keywords-based retrieval. The proposed approach aims to represent the document content by the best semantic network called document semantic core in two main steps. The first step extracts concepts (mono and multiword) from a document, driven by external generalpurpose ontology, namely WordNet. The second step builds the best semantic network by achieving a global disambiguation of the extracted concepts regarding to the document. Thus, selected concepts senses represent the nodes of the semantic network while the similarity measure values between them represent the arcs. The resulted scored concepts senses are used for conceptual indexing in Information Retrieval. MOTS-CLÉS : Recherche d'Information, Représentation Sémantique de documents, mesures de similarité sémantique, indexation conceptuelle, ontologies, WordNet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Clustering: Before and After the Singular Value Decomposition

Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval (IR) is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document which consists of a few terms. Therefore, in both clustering and retr...

متن کامل

Semantic Indexing Approach of a Corpora Based On Ontology

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models a...

متن کامل

Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies

OBJECTIVE The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. MATERIALS AND METHODS We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniq...

متن کامل

Using Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR

Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue within the language model framework. Unlike the ways translation model and traditional query expansion techniques approach this issue, we i...

متن کامل

An IE and IR Approach to deal with Geographic Information Scope in Textual Documents

We briefly present requirements and a methodology of semantic annotation for automatic indexing and geo-referencing of text documents. The first evaluation results shows that combining a spatial approach with a classical (statistical-based) IR one, improves in a significant way retrieval accuracy, namely in the case of “realistic” queries.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005